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Claims 

What is claimed is: 

1. A method for clustering a string, the string including a plurality of characters, the method 
including: 

5 identifying R unique n-grams Ti ...r in the string; 

for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first threshold: 

associating the string with a cluster associated with Ts; 
otherwise: 

10 for every other n-gram Ty in the string Ti ...r, except s* 

if the frequency of n-gram Tv is greater than the first threshold: 

if the frequency of n-gram pair Ts-Tv is not greater than a second threshold: 

associating the string with a cluster associated with the n-gram pair Ts-Ty; 
otherwise: 

1 5 for every other n-gram Tx in the string Ti ...r, except s and v: 

associating the string with a cluster associated with the n-gram 
triple Ts-Tv-Tx; 

otherwise: 
do nothing. 

20 2. The method of claim 1 further including compiling n-gram statistics. 

3. The method of claim 1 further including compiling n-gram pair statistics. 

4. A method for clustering a plurality of strings, each string including a plurality of characters, the 
method including: 

identifying unique n-grams in each string; 
25 associating each string with clusters associated with low frequency n-grams from that string, if 

any; and 

associating each string with clusters associated with low-frequency pairs of high frequency n- 
grams from that string, if any. 
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5. The method of claim 4 further including: 

where a string does not include any low-frequency pairs of high frequency n-grams, associating 
that string with clusters associated with triples of n-grams including the pair. 

6. A method for clustering a string, the string including a plurality of characters, the method 
5 including: 

identifying R unique n-grams Ti...Rin the string; 
for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first threshold: 
associating the string with a cluster associated with Ts; 
10 otherwise: 

for i = 1 to Y: 

for every unique set of i n-grams Tu in the string Ti...i^ except s' 

if the frequency of the n-gram set Ts-Tu is not greater than a second threshold: 
associating the string with a cluster associated with the n-gram set Ts-Tu; 
15 if the string has not been associated with a cluster with this value of Ts: 

for every unique set of Y+l n-grams Tuy in the string Ti...r, except 

associating the string with a cluster associated with the Y+2 n-gram group 
Ts-TuY- 

7. The method of claim 6 where Y = 1 . 

20 8. The method of claim 6 fiirther including compiling n-gram statistics. 

9. The method of claim 6 fiirther including compiling n-gram group statistics. 
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10. A computer program, stored on a tangible storage medium, for use in clustering a string, the 
program including executable instructions that cause a computer to: 

identify R unique n-grams Ti...Rin the string; 
for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first threshold: 
associate the string with a cluster associated with Ts; 

otherwise: 

for every other n-gram Ty in the string Ti ...r^ except s- 
if the frequency of n-gram Ty is greater than the first threshold: 

if the firequency of n-gram pair Ts-Ty is not greater than a second threshold: 

associate the string with a cluster associated with the n-gram pair Ts-Ty; 
otherwise 

for every other n-gram Tx in the string Ti...r, except s and v: 

associate the string with a cluster associated with the n-gram triple 
Ts-Ty-Tx; 

otherwise: 

do nothing. 

11. The computer program of claim 10 further including executable instructions that cause a 
computer to compile n-gram statistics. 

12. The computer program of claim 10 further including executable instructions that cause a 
computer to compile n-gram pair statistics. 
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